
Features of the Kaggle dataset are:¶
- id: case id
- name: Nomial data for name
- date: Timestamp of the shooting case
- manner_of_death: Victim was either shot or shot and tasered
- armed: Type of armed weapon the victime used
- age: Age of the victim
- gender: Gender of victim M (male) or F (female)
- race: Race of the victim with 6 categories: Asian, Black, White, Native, Hispanic, and Other
- city: City where the victim was shot
- state: State where the victim was shot
- signs_of_mental_illness: True or False if the victim has mental illness
- threat_level: Either the victim attacked, undetermined, or Other
- flee: The victim flet by Foot, Car, Not fleeing, or Other
- body_camera: Either the police had body camera on or not (True or False)
- arms_category: Category of arm that victim had
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
import plotly.graph_objs as go #importing graphical objects
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import classification_report, confusion_matrix
df = pd.read_csv('shootings.csv')
shooting =df
shooting.describe()
| id | age | |
|---|---|---|
| count | 4895.000000 | 4895.000000 |
| mean | 2902.148519 | 36.549750 |
| std | 1683.467910 | 12.694348 |
| min | 3.000000 | 6.000000 |
| 25% | 1441.500000 | 27.000000 |
| 50% | 2847.000000 | 35.000000 |
| 75% | 4352.500000 | 45.000000 |
| max | 5925.000000 | 91.000000 |
shooting.groupby(['race']).size()
race Asian 93 Black 1298 Hispanic 902 Native 78 Other 48 White 2476 dtype: int64
shooting['shooting count'] = shooting['name'].groupby(shooting['state']).transform('count')
shooting.head(5)
| id | name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | threat_level | flee | body_camera | arms_category | shooting count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | M | Asian | Shelton | WA | True | attack | Not fleeing | False | Guns | 126 |
| 1 | 4 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | M | White | Aloha | OR | False | attack | Not fleeing | False | Guns | 76 |
| 2 | 5 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | M | Hispanic | Wichita | KS | False | other | Not fleeing | False | Unarmed | 49 |
| 3 | 8 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | M | White | San Francisco | CA | True | attack | Not fleeing | False | Other unusual objects | 701 |
| 4 | 9 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | M | Hispanic | Evans | CO | False | attack | Not fleeing | False | Piercing objects | 168 |
shooting.groupby('flee').size()
flee Car 820 Foot 642 Not fleeing 3073 Other 360 dtype: int64
state_shooting= shooting['shooting count'].groupby(shooting['state']).value_counts()
state_shooting.sort_values(ascending = False)
state shooting count CA 701 701 TX 426 426 FL 324 324 AZ 222 222 CO 168 168 GA 161 161 OK 151 151 NC 148 148 OH 146 146 WA 126 126 TN 125 125 MO 124 124 LA 102 102 IL 99 99 AL 95 95 PA 95 95 NM 93 93 VA 92 92 IN 91 91 NY 90 90 WI 88 88 KY 87 87 NV 85 85 SC 80 80 MD 77 77 OR 76 76 AR 73 73 MI 71 71 MS 61 61 MN 60 60 NJ 60 60 UT 58 58 KS 49 49 WV 46 46 ID 37 37 AK 36 36 MA 33 33 IA 31 31 HI 29 29 MT 29 29 NE 24 24 ME 21 21 CT 20 20 SD 14 14 DC 13 13 WY 13 13 NH 12 12 ND 11 11 DE 10 10 VT 8 8 RI 4 4 Name: shooting count, dtype: int64
data = dict(type='choropleth',
locations = shooting['state'],
locationmode = 'USA-states',
colorscale = 'Reds',
z = shooting['shooting count'],
colorbar = {'title':"Cases Count"}
)
layout = dict(title = 'Shooting Cases Per State in USA',
geo = dict(scope='usa')
)
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)
plt.figure(figsize = (13,8))
sns.set_style('whitegrid')
shooting['age'].hist(bins=17, range=[10, 90], color = 'salmon')
plt.xlabel('age')
plt.ylabel('count')
plt.title('Shooting Case per Age Distribution')
Text(0.5, 1.0, 'Shooting Case per Age Distribution')
plt.figure(figsize = (13,8))
sns.countplot(x='gender',data= shooting,palette='coolwarm')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.title('Shooting Case per Gender Distribution')
Text(0.5, 1.0, 'Shooting Case per Gender Distribution')
words = ['armed']
word2 = pd.read_csv('shootings.csv',usecols=words)
wordcloud = WordCloud(background_color="white",max_font_size=100).generate(' '.join(word2['armed']))
plt.figure(figsize=(12,12))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
shooting
| id | name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | threat_level | flee | body_camera | arms_category | shooting count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | M | Asian | Shelton | WA | True | attack | Not fleeing | False | Guns | 126 |
| 1 | 4 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | M | White | Aloha | OR | False | attack | Not fleeing | False | Guns | 76 |
| 2 | 5 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | M | Hispanic | Wichita | KS | False | other | Not fleeing | False | Unarmed | 49 |
| 3 | 8 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | M | White | San Francisco | CA | True | attack | Not fleeing | False | Other unusual objects | 701 |
| 4 | 9 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | M | Hispanic | Evans | CO | False | attack | Not fleeing | False | Piercing objects | 168 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | 5916 | Rayshard Brooks | 2020-06-12 | shot | Taser | 27.0 | M | Black | Atlanta | GA | False | attack | Foot | True | Electrical devices | 161 |
| 4891 | 5925 | Caine Van Pelt | 2020-06-12 | shot | gun | 23.0 | M | Black | Crown Point | IN | False | attack | Car | False | Guns | 91 |
| 4892 | 5918 | Hannah Fizer | 2020-06-13 | shot | unarmed | 25.0 | F | White | Sedalia | MO | False | other | Not fleeing | False | Unarmed | 124 |
| 4893 | 5921 | William Slyter | 2020-06-13 | shot | gun | 22.0 | M | White | Kansas City | MO | False | other | Other | False | Guns | 124 |
| 4894 | 5924 | Nicholas Hirsh | 2020-06-15 | shot | gun | 31.0 | M | White | Lawrence | KS | False | attack | Car | False | Guns | 49 |
4895 rows × 16 columns
shooting['armed'].value_counts().sort_values(ascending=False).head(20)
gun 2755 knife 708 unknown 418 unarmed 348 toy weapon 171 vehicle 120 machete 39 Taser 24 sword 22 ax 21 baseball bat 16 gun and knife 15 hammer 14 metal pipe 12 screwdriver 12 sharp object 11 box cutter 11 hatchet 11 gun and vehicle 10 gun and car 9 Name: armed, dtype: int64
top_10_labels = [y for y in shooting.armed.value_counts().sort_values(ascending=False).head(10).index]
top_10_labels
['gun', 'knife', 'unknown', 'unarmed', 'toy weapon', 'vehicle', 'machete', 'Taser', 'sword', 'ax']
def one_hot_encoding_top_x(shooting, variable, top_x_labels):
# function to create the dummy variables for the most frequent labels
# we can vary the number of most frequent labels that we encode
for label in top_x_labels:
shooting[variable+'_'+label] = np.where(shooting[variable]==label, 1, 0)
# read the data again
shooting = pd.read_csv('shootings.csv')
# encode Xarmed into the 10 most frequent categories
one_hot_encoding_top_x(shooting, 'armed', top_10_labels)
shooting.head()
| id | name | date | manner_of_death | armed | age | gender | race | city | state | ... | armed_gun | armed_knife | armed_unknown | armed_unarmed | armed_toy weapon | armed_vehicle | armed_machete | armed_Taser | armed_sword | armed_ax | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | M | Asian | Shelton | WA | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 4 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | M | White | Aloha | OR | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 5 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | M | Hispanic | Wichita | KS | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 8 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | M | White | San Francisco | CA | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | 9 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | M | Hispanic | Evans | CO | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 25 columns
shooting['arms_category'].value_counts().sort_values(ascending=False).head(20)
Guns 2764 Sharp objects 818 Unknown 418 Unarmed 348 Other unusual objects 192 Blunt instruments 122 Vehicles 121 Multiple 54 Piercing objects 29 Electrical devices 24 Explosives 4 Hand tools 1 Name: arms_category, dtype: int64
top_10_labels = [y for y in shooting.arms_category.value_counts().sort_values(ascending=False).head(10).index]
top_10_labels
['Guns', 'Sharp objects', 'Unknown', 'Unarmed', 'Other unusual objects', 'Blunt instruments', 'Vehicles', 'Multiple', 'Piercing objects', 'Electrical devices']
# read the data again
shooting = pd.read_csv('shootings.csv')
# encode arms_category into the 10 most frequent categories
one_hot_encoding_top_x(shooting, 'arms_category', top_10_labels)
shooting.head()
| id | name | date | manner_of_death | armed | age | gender | race | city | state | ... | arms_category_Guns | arms_category_Sharp objects | arms_category_Unknown | arms_category_Unarmed | arms_category_Other unusual objects | arms_category_Blunt instruments | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | M | Asian | Shelton | WA | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 4 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | M | White | Aloha | OR | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 5 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | M | Hispanic | Wichita | KS | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 8 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | M | White | San Francisco | CA | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | 9 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | M | Hispanic | Evans | CO | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
5 rows × 25 columns
shooting = shooting.drop('id', axis = 1)
shooting
| name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | ... | arms_category_Guns | arms_category_Sharp objects | arms_category_Unknown | arms_category_Unarmed | arms_category_Other unusual objects | arms_category_Blunt instruments | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | M | Asian | Shelton | WA | True | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | M | White | Aloha | OR | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | M | Hispanic | Wichita | KS | False | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | M | White | San Francisco | CA | True | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | M | Hispanic | Evans | CO | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | shot | Taser | 27.0 | M | Black | Atlanta | GA | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4891 | Caine Van Pelt | 2020-06-12 | shot | gun | 23.0 | M | Black | Crown Point | IN | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | shot | unarmed | 25.0 | F | White | Sedalia | MO | False | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4893 | William Slyter | 2020-06-13 | shot | gun | 22.0 | M | White | Kansas City | MO | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4894 | Nicholas Hirsh | 2020-06-15 | shot | gun | 31.0 | M | White | Lawrence | KS | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4895 rows × 24 columns
shooting['gender'] = shooting['gender'].map({'M':0,'F':1})
shooting
| name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | ... | arms_category_Guns | arms_category_Sharp objects | arms_category_Unknown | arms_category_Unarmed | arms_category_Other unusual objects | arms_category_Blunt instruments | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | 0 | Asian | Shelton | WA | True | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | 0 | White | Aloha | OR | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | 0 | Hispanic | Wichita | KS | False | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | 0 | White | San Francisco | CA | True | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | 0 | Hispanic | Evans | CO | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | shot | Taser | 27.0 | 0 | Black | Atlanta | GA | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4891 | Caine Van Pelt | 2020-06-12 | shot | gun | 23.0 | 0 | Black | Crown Point | IN | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | shot | unarmed | 25.0 | 1 | White | Sedalia | MO | False | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4893 | William Slyter | 2020-06-13 | shot | gun | 22.0 | 0 | White | Kansas City | MO | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4894 | Nicholas Hirsh | 2020-06-15 | shot | gun | 31.0 | 0 | White | Lawrence | KS | False | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4895 rows × 24 columns
shooting.groupby('race').size()
race Asian 93 Black 1298 Hispanic 902 Native 78 Other 48 White 2476 dtype: int64
race_dummies = pd.get_dummies(shooting['race'], drop_first=False)
shooting = pd.concat([shooting,race_dummies],axis=1)
shooting
| name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | ... | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | Asian | Black | Hispanic | Native | Other | White | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | 0 | Asian | Shelton | WA | True | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | 0 | White | Aloha | OR | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | 0 | Hispanic | Wichita | KS | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | 0 | White | San Francisco | CA | True | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | 0 | Hispanic | Evans | CO | False | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | shot | Taser | 27.0 | 0 | Black | Atlanta | GA | False | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4891 | Caine Van Pelt | 2020-06-12 | shot | gun | 23.0 | 0 | Black | Crown Point | IN | False | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | shot | unarmed | 25.0 | 1 | White | Sedalia | MO | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4893 | William Slyter | 2020-06-13 | shot | gun | 22.0 | 0 | White | Kansas City | MO | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4894 | Nicholas Hirsh | 2020-06-15 | shot | gun | 31.0 | 0 | White | Lawrence | KS | False | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
4895 rows × 30 columns
shooting = shooting.drop('race', axis = 1)
shooting
| name | date | manner_of_death | armed | age | gender | city | state | signs_of_mental_illness | threat_level | ... | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | Asian | Black | Hispanic | Native | Other | White | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | shot | gun | 53.0 | 0 | Shelton | WA | True | attack | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47.0 | 0 | Aloha | OR | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23.0 | 0 | Wichita | KS | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32.0 | 0 | San Francisco | CA | True | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39.0 | 0 | Evans | CO | False | attack | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | shot | Taser | 27.0 | 0 | Atlanta | GA | False | attack | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4891 | Caine Van Pelt | 2020-06-12 | shot | gun | 23.0 | 0 | Crown Point | IN | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | shot | unarmed | 25.0 | 1 | Sedalia | MO | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4893 | William Slyter | 2020-06-13 | shot | gun | 22.0 | 0 | Kansas City | MO | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4894 | Nicholas Hirsh | 2020-06-15 | shot | gun | 31.0 | 0 | Lawrence | KS | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
4895 rows × 29 columns
# 0 as shot and 1 as shot and Tasered
shooting['manner_of_death'] = shooting['manner_of_death'].map({'shot':0, 'shot and Tasered':1})
shooting
| name | date | manner_of_death | armed | age | gender | city | state | signs_of_mental_illness | threat_level | ... | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | Asian | Black | Hispanic | Native | Other | White | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | 0 | gun | 53.0 | 0 | Shelton | WA | True | attack | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | 0 | gun | 47.0 | 0 | Aloha | OR | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | John Paul Quintero | 2015-01-03 | 1 | unarmed | 23.0 | 0 | Wichita | KS | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | 0 | toy weapon | 32.0 | 0 | San Francisco | CA | True | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | Michael Rodriguez | 2015-01-04 | 0 | nail gun | 39.0 | 0 | Evans | CO | False | attack | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | 0 | Taser | 27.0 | 0 | Atlanta | GA | False | attack | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4891 | Caine Van Pelt | 2020-06-12 | 0 | gun | 23.0 | 0 | Crown Point | IN | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | 0 | unarmed | 25.0 | 1 | Sedalia | MO | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4893 | William Slyter | 2020-06-13 | 0 | gun | 22.0 | 0 | Kansas City | MO | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4894 | Nicholas Hirsh | 2020-06-15 | 0 | gun | 31.0 | 0 | Lawrence | KS | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
4895 rows × 29 columns
shooting
| name | date | manner_of_death | armed | age | gender | city | state | signs_of_mental_illness | threat_level | ... | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | Asian | Black | Hispanic | Native | Other | White | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | 0 | gun | 53.0 | 0 | Shelton | WA | True | attack | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | 0 | gun | 47.0 | 0 | Aloha | OR | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | John Paul Quintero | 2015-01-03 | 1 | unarmed | 23.0 | 0 | Wichita | KS | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | 0 | toy weapon | 32.0 | 0 | San Francisco | CA | True | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | Michael Rodriguez | 2015-01-04 | 0 | nail gun | 39.0 | 0 | Evans | CO | False | attack | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | 0 | Taser | 27.0 | 0 | Atlanta | GA | False | attack | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4891 | Caine Van Pelt | 2020-06-12 | 0 | gun | 23.0 | 0 | Crown Point | IN | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | 0 | unarmed | 25.0 | 1 | Sedalia | MO | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4893 | William Slyter | 2020-06-13 | 0 | gun | 22.0 | 0 | Kansas City | MO | False | other | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4894 | Nicholas Hirsh | 2020-06-15 | 0 | gun | 31.0 | 0 | Lawrence | KS | False | attack | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
4895 rows × 29 columns
shooting['signs_of_mental_illness'] = shooting['signs_of_mental_illness'].map({False:0, True:1}).astype(int)
shooting['threat_level'] = shooting['threat_level'].map({'attack':1, 'other':0, 'undetermined':2})
shooting
| name | date | manner_of_death | armed | age | gender | city | state | signs_of_mental_illness | threat_level | ... | arms_category_Vehicles | arms_category_Multiple | arms_category_Piercing objects | arms_category_Electrical devices | Asian | Black | Hispanic | Native | Other | White | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | 0 | gun | 53.0 | 0 | Shelton | WA | 1 | 1 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | 0 | gun | 47.0 | 0 | Aloha | OR | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | John Paul Quintero | 2015-01-03 | 1 | unarmed | 23.0 | 0 | Wichita | KS | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | 0 | toy weapon | 32.0 | 0 | San Francisco | CA | 1 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | Michael Rodriguez | 2015-01-04 | 0 | nail gun | 39.0 | 0 | Evans | CO | 0 | 1 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | 0 | Taser | 27.0 | 0 | Atlanta | GA | 0 | 1 | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4891 | Caine Van Pelt | 2020-06-12 | 0 | gun | 23.0 | 0 | Crown Point | IN | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | 0 | unarmed | 25.0 | 1 | Sedalia | MO | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4893 | William Slyter | 2020-06-13 | 0 | gun | 22.0 | 0 | Kansas City | MO | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4894 | Nicholas Hirsh | 2020-06-15 | 0 | gun | 31.0 | 0 | Lawrence | KS | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
4895 rows × 29 columns
shooting.groupby('flee').size()
flee Car 820 Foot 642 Not fleeing 3073 Other 360 dtype: int64
flee_dummies = pd.get_dummies(shooting['flee'], drop_first=False)
shooting = pd.concat([shooting,flee_dummies],axis=1)
shooting = shooting.drop('flee', axis =1)
shooting.iloc[:,0:18]
| name | date | manner_of_death | armed | age | gender | city | state | signs_of_mental_illness | threat_level | body_camera | arms_category | arms_category_Guns | arms_category_Sharp objects | arms_category_Unknown | arms_category_Unarmed | arms_category_Other unusual objects | arms_category_Blunt instruments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | 0 | gun | 53.0 | 0 | Shelton | WA | 1 | 1 | False | Guns | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | 0 | gun | 47.0 | 0 | Aloha | OR | 0 | 1 | False | Guns | 1 | 0 | 0 | 0 | 0 | 0 |
| 2 | John Paul Quintero | 2015-01-03 | 1 | unarmed | 23.0 | 0 | Wichita | KS | 0 | 0 | False | Unarmed | 0 | 0 | 0 | 1 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | 0 | toy weapon | 32.0 | 0 | San Francisco | CA | 1 | 1 | False | Other unusual objects | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | Michael Rodriguez | 2015-01-04 | 0 | nail gun | 39.0 | 0 | Evans | CO | 0 | 1 | False | Piercing objects | 0 | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | 0 | Taser | 27.0 | 0 | Atlanta | GA | 0 | 1 | True | Electrical devices | 0 | 0 | 0 | 0 | 0 | 0 |
| 4891 | Caine Van Pelt | 2020-06-12 | 0 | gun | 23.0 | 0 | Crown Point | IN | 0 | 1 | False | Guns | 1 | 0 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | 0 | unarmed | 25.0 | 1 | Sedalia | MO | 0 | 0 | False | Unarmed | 0 | 0 | 0 | 1 | 0 | 0 |
| 4893 | William Slyter | 2020-06-13 | 0 | gun | 22.0 | 0 | Kansas City | MO | 0 | 0 | False | Guns | 1 | 0 | 0 | 0 | 0 | 0 |
| 4894 | Nicholas Hirsh | 2020-06-15 | 0 | gun | 31.0 | 0 | Lawrence | KS | 0 | 1 | False | Guns | 1 | 0 | 0 | 0 | 0 | 0 |
4895 rows × 18 columns
shooting['body_camera'] = shooting['body_camera'].map({True:1, False:0}).astype(int)
shooting['body_camera']
0 0
1 0
2 0
3 0
4 0
..
4890 1
4891 0
4892 0
4893 0
4894 0
Name: body_camera, Length: 4895, dtype: int32
arms_cat_dummies = pd.get_dummies(shooting['arms_category'], drop_first=False)
shooting = pd.concat([shooting,arms_cat_dummies],axis=1)
shooting = shooting.drop('arms_category', axis = 1)
shooting
| name | date | manner_of_death | armed | age | gender | city | state | signs_of_mental_illness | threat_level | ... | Explosives | Guns | Hand tools | Multiple | Other unusual objects | Piercing objects | Sharp objects | Unarmed | Unknown | Vehicles | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tim Elliot | 2015-01-02 | 0 | gun | 53.0 | 0 | Shelton | WA | 1 | 1 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | Lewis Lee Lembke | 2015-01-02 | 0 | gun | 47.0 | 0 | Aloha | OR | 0 | 1 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | John Paul Quintero | 2015-01-03 | 1 | unarmed | 23.0 | 0 | Wichita | KS | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 3 | Matthew Hoffman | 2015-01-04 | 0 | toy weapon | 32.0 | 0 | San Francisco | CA | 1 | 1 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | Michael Rodriguez | 2015-01-04 | 0 | nail gun | 39.0 | 0 | Evans | CO | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | Rayshard Brooks | 2020-06-12 | 0 | Taser | 27.0 | 0 | Atlanta | GA | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4891 | Caine Van Pelt | 2020-06-12 | 0 | gun | 23.0 | 0 | Crown Point | IN | 0 | 1 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4892 | Hannah Fizer | 2020-06-13 | 0 | unarmed | 25.0 | 1 | Sedalia | MO | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 4893 | William Slyter | 2020-06-13 | 0 | gun | 22.0 | 0 | Kansas City | MO | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4894 | Nicholas Hirsh | 2020-06-15 | 0 | gun | 31.0 | 0 | Lawrence | KS | 0 | 1 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4895 rows × 43 columns
shooting = shooting.drop(['state', 'city','name'], axis = 1)
shooting
| date | manner_of_death | armed | age | gender | signs_of_mental_illness | threat_level | body_camera | arms_category_Guns | arms_category_Sharp objects | ... | Explosives | Guns | Hand tools | Multiple | Other unusual objects | Piercing objects | Sharp objects | Unarmed | Unknown | Vehicles | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-02 | 0 | gun | 53.0 | 0 | 1 | 1 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2015-01-02 | 0 | gun | 47.0 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2015-01-03 | 1 | unarmed | 23.0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 3 | 2015-01-04 | 0 | toy weapon | 32.0 | 0 | 1 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2015-01-04 | 0 | nail gun | 39.0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | 2020-06-12 | 0 | Taser | 27.0 | 0 | 0 | 1 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4891 | 2020-06-12 | 0 | gun | 23.0 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4892 | 2020-06-13 | 0 | unarmed | 25.0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 4893 | 2020-06-13 | 0 | gun | 22.0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4894 | 2020-06-15 | 0 | gun | 31.0 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4895 rows × 40 columns
shooting['date'] = pd.to_datetime(shooting['date'])
shooting['day'] = shooting['date'].apply(lambda date:date.day)
shooting['month'] = shooting['date'].apply(lambda date:date.month)
shooting['year'] = shooting['date'].apply(lambda date:date.year)
<ipython-input-51-b3b118feae7d>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy <ipython-input-51-b3b118feae7d>:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy <ipython-input-51-b3b118feae7d>:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
shooting
| date | manner_of_death | armed | age | gender | signs_of_mental_illness | threat_level | body_camera | arms_category_Guns | arms_category_Sharp objects | ... | Multiple | Other unusual objects | Piercing objects | Sharp objects | Unarmed | Unknown | Vehicles | day | month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-02 | 0 | gun | 53.0 | 0 | 1 | 1 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 2015 |
| 1 | 2015-01-02 | 0 | gun | 47.0 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 2015 |
| 2 | 2015-01-03 | 1 | unarmed | 23.0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 | 2015 |
| 3 | 2015-01-04 | 0 | toy weapon | 32.0 | 0 | 1 | 1 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 4 | 1 | 2015 |
| 4 | 2015-01-04 | 0 | nail gun | 39.0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4 | 1 | 2015 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | 2020-06-12 | 0 | Taser | 27.0 | 0 | 0 | 1 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 6 | 2020 |
| 4891 | 2020-06-12 | 0 | gun | 23.0 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 6 | 2020 |
| 4892 | 2020-06-13 | 0 | unarmed | 25.0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 13 | 6 | 2020 |
| 4893 | 2020-06-13 | 0 | gun | 22.0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 6 | 2020 |
| 4894 | 2020-06-15 | 0 | gun | 31.0 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 6 | 2020 |
4895 rows × 43 columns
shooting = shooting.drop('date',axis =1)
shooting
| manner_of_death | armed | age | gender | signs_of_mental_illness | threat_level | body_camera | arms_category_Guns | arms_category_Sharp objects | arms_category_Unknown | ... | Multiple | Other unusual objects | Piercing objects | Sharp objects | Unarmed | Unknown | Vehicles | day | month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | gun | 53.0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 2015 |
| 1 | 0 | gun | 47.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 2015 |
| 2 | 1 | unarmed | 23.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 | 2015 |
| 3 | 0 | toy weapon | 32.0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 4 | 1 | 2015 |
| 4 | 0 | nail gun | 39.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4 | 1 | 2015 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | 0 | Taser | 27.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 6 | 2020 |
| 4891 | 0 | gun | 23.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 6 | 2020 |
| 4892 | 0 | unarmed | 25.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 13 | 6 | 2020 |
| 4893 | 0 | gun | 22.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 6 | 2020 |
| 4894 | 0 | gun | 31.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 6 | 2020 |
4895 rows × 42 columns
shooting = shooting.drop('armed',axis = 1)
shooting.columns
Index(['manner_of_death', 'age', 'gender', 'signs_of_mental_illness',
'threat_level', 'body_camera', 'arms_category_Guns',
'arms_category_Sharp objects', 'arms_category_Unknown',
'arms_category_Unarmed', 'arms_category_Other unusual objects',
'arms_category_Blunt instruments', 'arms_category_Vehicles',
'arms_category_Multiple', 'arms_category_Piercing objects',
'arms_category_Electrical devices', 'Asian', 'Black', 'Hispanic',
'Native', 'Other', 'White', 'Car', 'Foot', 'Not fleeing', 'Other',
'Blunt instruments', 'Electrical devices', 'Explosives', 'Guns',
'Hand tools', 'Multiple', 'Other unusual objects', 'Piercing objects',
'Sharp objects', 'Unarmed', 'Unknown', 'Vehicles', 'day', 'month',
'year'],
dtype='object')
shooting.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4895 entries, 0 to 4894 Data columns (total 41 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 manner_of_death 4895 non-null int64 1 age 4895 non-null float64 2 gender 4895 non-null int64 3 signs_of_mental_illness 4895 non-null int32 4 threat_level 4895 non-null int64 5 body_camera 4895 non-null int32 6 arms_category_Guns 4895 non-null int32 7 arms_category_Sharp objects 4895 non-null int32 8 arms_category_Unknown 4895 non-null int32 9 arms_category_Unarmed 4895 non-null int32 10 arms_category_Other unusual objects 4895 non-null int32 11 arms_category_Blunt instruments 4895 non-null int32 12 arms_category_Vehicles 4895 non-null int32 13 arms_category_Multiple 4895 non-null int32 14 arms_category_Piercing objects 4895 non-null int32 15 arms_category_Electrical devices 4895 non-null int32 16 Asian 4895 non-null uint8 17 Black 4895 non-null uint8 18 Hispanic 4895 non-null uint8 19 Native 4895 non-null uint8 20 Other 4895 non-null uint8 21 White 4895 non-null uint8 22 Car 4895 non-null uint8 23 Foot 4895 non-null uint8 24 Not fleeing 4895 non-null uint8 25 Other 4895 non-null uint8 26 Blunt instruments 4895 non-null uint8 27 Electrical devices 4895 non-null uint8 28 Explosives 4895 non-null uint8 29 Guns 4895 non-null uint8 30 Hand tools 4895 non-null uint8 31 Multiple 4895 non-null uint8 32 Other unusual objects 4895 non-null uint8 33 Piercing objects 4895 non-null uint8 34 Sharp objects 4895 non-null uint8 35 Unarmed 4895 non-null uint8 36 Unknown 4895 non-null uint8 37 Vehicles 4895 non-null uint8 38 day 4895 non-null int64 39 month 4895 non-null int64 40 year 4895 non-null int64 dtypes: float64(1), int32(12), int64(6), uint8(22) memory usage: 602.4 KB
shooting
| manner_of_death | age | gender | signs_of_mental_illness | threat_level | body_camera | arms_category_Guns | arms_category_Sharp objects | arms_category_Unknown | arms_category_Unarmed | ... | Multiple | Other unusual objects | Piercing objects | Sharp objects | Unarmed | Unknown | Vehicles | day | month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 53.0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 2015 |
| 1 | 0 | 47.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 2015 |
| 2 | 1 | 23.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 | 2015 |
| 3 | 0 | 32.0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 4 | 1 | 2015 |
| 4 | 0 | 39.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4 | 1 | 2015 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4890 | 0 | 27.0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 6 | 2020 |
| 4891 | 0 | 23.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 6 | 2020 |
| 4892 | 0 | 25.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 13 | 6 | 2020 |
| 4893 | 0 | 22.0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 6 | 2020 |
| 4894 | 0 | 31.0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 6 | 2020 |
4895 rows × 41 columns
std_scaler = StandardScaler()
std_scaler.fit(shooting)
StandardScaler()
pca = PCA(n_components=28)
scaled_data = std_scaler.transform(shooting)
pca.fit(scaled_data)
PCA(n_components=28)
scaled_data.shape
(4895, 41)
x_pca = pca.transform(scaled_data)
x_pca.shape
(4895, 28)
pca.components_
array([[ 1.58665094e-01, -4.17781576e-02, 3.84941665e-02, ...,
-2.60396986e-04, -5.43632131e-03, -2.40555784e-02],
[-7.23112520e-02, -1.77709848e-01, 1.25697512e-02, ...,
3.36141947e-02, 1.15954326e-02, 4.43426268e-02],
[ 7.20212199e-02, -1.46183080e-01, -6.94242563e-03, ...,
-1.06707136e-02, -6.37247156e-02, 3.25601848e-02],
...,
[-8.88904140e-02, -7.93972800e-02, 6.67531017e-02, ...,
-1.13267946e-02, -9.76975112e-02, 2.78377205e-01],
[-4.62211121e-02, 8.01240086e-01, -4.51234352e-03, ...,
9.10206149e-02, 7.70129389e-02, -6.17455872e-02],
[-2.88940371e-01, 1.86996106e-01, 5.14006269e-02, ...,
-9.03868687e-02, -4.65529555e-01, -6.07363810e-01]])
variance = np.array(pca.explained_variance_ratio_)
list(map('{:.2f}'.format,variance))
['0.08', '0.07', '0.06', '0.05', '0.05', '0.05', '0.05', '0.05', '0.05', '0.05', '0.03', '0.03', '0.03', '0.03', '0.03', '0.03', '0.03', '0.02', '0.02', '0.02', '0.02', '0.02', '0.02', '0.02', '0.02', '0.02', '0.02', '0.02']
df_comp = pd.DataFrame(pca.components_,columns=shooting.columns)
plt.figure(figsize=(12,8))
sns.heatmap(df_comp,cmap='plasma',)
<AxesSubplot:>
shooting.keys()
Index(['manner_of_death', 'age', 'gender', 'signs_of_mental_illness',
'threat_level', 'body_camera', 'arms_category_Guns',
'arms_category_Sharp objects', 'arms_category_Unknown',
'arms_category_Unarmed', 'arms_category_Other unusual objects',
'arms_category_Blunt instruments', 'arms_category_Vehicles',
'arms_category_Multiple', 'arms_category_Piercing objects',
'arms_category_Electrical devices', 'Asian', 'Black', 'Hispanic',
'Native', 'Other', 'White', 'Car', 'Foot', 'Not fleeing', 'Other',
'Blunt instruments', 'Electrical devices', 'Explosives', 'Guns',
'Hand tools', 'Multiple', 'Other unusual objects', 'Piercing objects',
'Sharp objects', 'Unarmed', 'Unknown', 'Vehicles', 'day', 'month',
'year'],
dtype='object')
shooting.shape
(4895, 41)
X = shooting.drop('manner_of_death', axis =1)
y = shooting['manner_of_death']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
X.shape
(4895, 40)
y.shape
(4895,)
#fit pca into our train test
pca.fit(X_train)
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)
logreg = LogisticRegression(random_state = 0, solver = 'liblinear',multi_class='auto')
logreg_parameters = {'penalty':('l1', 'l2'), 'C':[0.01, 0.1, 1, 10, 100]}
logreg_grid = GridSearchCV(logreg, logreg_parameters,cv=10)
logreg_grid.fit(X_train, y_train)
GridSearchCV(cv=10,
estimator=LogisticRegression(random_state=0, solver='liblinear'),
param_grid={'C': [0.01, 0.1, 1, 10, 100], 'penalty': ('l1', 'l2')})
logreg_grid.best_params_
{'C': 0.01, 'penalty': 'l1'}
logreg_test = LogisticRegression(random_state = 0, solver = 'liblinear',multi_class='auto',C=0.01,penalty='l1')
logreg_test.fit(X_train, y_train)
print('Train score: {:.4f}'.format(logreg_test.score(X_train, y_train)))
print('Test score: {:.4f}'.format(logreg_test.score(X_test, y_test)))
Train score: 0.9521 Test score: 0.9428
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn_parameters = {'n_neighbors':np.arange(1,20,1)}
knn_grid = GridSearchCV(knn, knn_parameters,cv=10)
knn_grid.fit(X_train, y_train)
GridSearchCV(cv=10, estimator=KNeighborsClassifier(),
param_grid={'n_neighbors': array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19])})
knn_grid.best_params_
{'n_neighbors': 2}
knn_test = KNeighborsClassifier(n_neighbors=2, leaf_size= 30, metric='minkowski', p=2, weights= 'uniform')
knn_test.fit(X_train, y_train)
print('Train score: {:.4f}'.format(knn_test.score(X_train, y_train)))
print('Test score: {:.4f}'.format(knn_test.score(X_test, y_test)))
Train score: 0.9545 Test score: 0.9415
knn_prediction = knn_test.predict(X_test)
print('Confusion Matrix \n', confusion_matrix(y_test, knn_prediction))
print(classification_report(y_test, knn_prediction))
print('Precision score: ', precision_score( y_test, knn_prediction, average = 'macro'))
print('Recall Score: ', recall_score( y_test, knn_prediction, average = 'macro'))
Confusion Matrix
[[1383 2]
[ 84 0]]
precision recall f1-score support
0 0.94 1.00 0.97 1385
1 0.00 0.00 0.00 84
accuracy 0.94 1469
macro avg 0.47 0.50 0.48 1469
weighted avg 0.89 0.94 0.91 1469
Precision score: 0.47137014314928427
Recall Score: 0.49927797833935017
from sklearn.svm import LinearSVC
linr = LinearSVC(random_state = 0, dual=False)
linr_parameters = {'penalty':('l1', 'l2'), 'C':[0.001,0.01,0.1,1,10,100] }
linr_grid = GridSearchCV(linr, linr_parameters,cv=10,iid=True)
linr_grid.fit(X_train, y_train)
C:\Users\vochi\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:847: FutureWarning: The parameter 'iid' is deprecated in 0.22 and will be removed in 0.24.
GridSearchCV(cv=10, estimator=LinearSVC(dual=False, random_state=0), iid=True,
param_grid={'C': [0.001, 0.01, 0.1, 1, 10, 100],
'penalty': ('l1', 'l2')})
linr_grid.best_params_
{'C': 0.001, 'penalty': 'l1'}
linr_test = LinearSVC(random_state = 0,dual=False,C= 0.001, penalty= 'l1')
linr_test.fit(X_train, y_train)
print('Train score: {:.4f}'.format(linr_test.score(X_train, y_train)))
print('Test score: {:.4f}'.format(linr_test.score(X_test, y_test)))
Train score: 0.9521 Test score: 0.9428
lnsvc_prediction = linr_test.predict(X_test)
print('Confusion Matrix \n', confusion_matrix(y_test, lnsvc_prediction))
print(classification_report(y_test, lnsvc_prediction))
print('Precision score: ', precision_score( y_test, lnsvc_prediction, average = 'macro'))
print('Recall Score: ', recall_score( y_test, lnsvc_prediction, average = 'macro'))
Confusion Matrix
[[1385 0]
[ 84 0]]
precision recall f1-score support
0 0.94 1.00 0.97 1385
1 0.00 0.00 0.00 84
accuracy 0.94 1469
macro avg 0.47 0.50 0.49 1469
weighted avg 0.89 0.94 0.92 1469
Precision score: 0.4714091218515997
Recall Score: 0.5
C:\Users\vochi\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1221: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. C:\Users\vochi\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1221: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
from sklearn.svm import SVC
svm=SVC(kernel='linear',random_state=0)
#svm_parameters = {'gamma': [0.001, 0.01, 0.1, 1], 'C': [0.001, 0.01, 0.1, 1]}
#svm_grid = GridSearchCV(svm, svm_parameters,cv=3, n_jobs=4)
#svm_grid.fit(X_train, y_train)
svm_grid_para = {'C': 0.001,'gamma':0.001}
svm_grid_para
#C 0.001; Gamma 0.001
{'C': 0.001, 'gamma': 0.001}
from sklearn.svm import SVC
svm_test = SVC(kernel='linear',random_state=0,C= 0.001, gamma= 0.001)
svm_test.fit(X_train, y_train)
print('Train score: {:.4f}'.format(svm_test.score(X_train, y_train)))
print('Test score: {:.4f}'.format(svm_test.score(X_test, y_test)))
Train score: 0.9521 Test score: 0.9428
svm_prediction = svm_test.predict(X_test)
print('Confusion Matrix \n', confusion_matrix(y_test, svm_prediction))
print(classification_report(y_test, svm_prediction))
print('Precision score: ', precision_score( y_test, svm_prediction, average = 'macro'))
print('Recall Score: ', recall_score( y_test, svm_prediction, average = 'macro'))
Confusion Matrix
[[1385 0]
[ 84 0]]
precision recall f1-score support
0 0.94 1.00 0.97 1385
1 0.00 0.00 0.00 84
accuracy 0.94 1469
macro avg 0.47 0.50 0.49 1469
weighted avg 0.89 0.94 0.92 1469
Precision score: 0.4714091218515997
Recall Score: 0.5
C:\Users\vochi\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1221: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. C:\Users\vochi\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1221: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier(random_state=0)
dtree_parameters = {'criterion': ['gini', 'entropy'],
'splitter': ['best', 'random'],
'max_depth': [1, 2, 3, 4, 5, 6, 7],
'max_features': [1, 2, 3, 4, 5, 6],
'min_samples_split':[5, 10, 15, 20, 25],
'min_samples_leaf':[1, 2, 3, 4, 5, 6]}
dtree_grid = GridSearchCV(dtree, dtree_parameters,cv=10)
dtree_grid.fit(X_train, y_train)
GridSearchCV(cv=10, estimator=DecisionTreeClassifier(random_state=0),
param_grid={'criterion': ['gini', 'entropy'],
'max_depth': [1, 2, 3, 4, 5, 6, 7],
'max_features': [1, 2, 3, 4, 5, 6],
'min_samples_leaf': [1, 2, 3, 4, 5, 6],
'min_samples_split': [5, 10, 15, 20, 25],
'splitter': ['best', 'random']})
dtree_grid.best_params_
{'criterion': 'gini',
'max_depth': 7,
'max_features': 2,
'min_samples_leaf': 3,
'min_samples_split': 5,
'splitter': 'random'}
dtree_test = DecisionTreeClassifier(random_state=0,max_depth=7 ,criterion='gini',max_features=2,min_samples_leaf=3,min_samples_split=5,splitter='random')
dtree_test.fit(X_train, y_train)
plt.figure(figsize = (20,8))
tree.plot_tree(dtree_test,precision=4,impurity=True,filled=True)
plt.show()
print('Train score: {:.4f}'.format(dtree_test.score(X_train, y_train)))
print('Test score: {:.4f}'.format(dtree_test.score(X_test, y_test)))
Train score: 0.9527 Test score: 0.9415
final_test_prediction = svm_test.predict(X_test)
print('Confusion Matrix \n', confusion_matrix(y_test, final_test_prediction))
Confusion Matrix [[1385 0] [ 84 0]]
print(classification_report(y_test, final_test_prediction))
precision recall f1-score support
0 0.94 1.00 0.97 1385
1 0.00 0.00 0.00 84
accuracy 0.94 1469
macro avg 0.47 0.50 0.49 1469
weighted avg 0.89 0.94 0.92 1469
C:\Users\vochi\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1221: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
print('Precision score: ', precision_score( y_test, final_test_prediction, average = 'macro'))
print('Recall Score: ', recall_score( y_test, final_test_prediction, average = 'macro'))
Precision score: 0.4714091218515997 Recall Score: 0.5
C:\Users\vochi\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1221: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.